Improving statistical machine translation by classifying and generalizing inflected verb forms
نویسندگان
چکیده
This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen verb forms. Translation results for an English to Spanish task are reported, producing a significant performance improvement.
منابع مشابه
Handling verb phrase morphology in highly inflected Indian languages for Machine Translation
The phrase based systems for machine translation are limited by the phrases that they see during the training. For highly inflected languages, it is uncommon to see all the forms of a word in the parallel corpora used during training. This problem is amplified for verbs in highly inflected languages where the correct form of the word depends on factors like gender, number and tense aspect. We p...
متن کاملStatistical Machine Translation with Scarce Resources Using Morpho-syntactic Information
In statistical machine translation, correspondences between the words in the source and the target language are learned from parallel corpora, and often little or no linguistic knowledge is used to structure the underlying models. In particular, existing statistical systems for machine translation often treat different inflected forms of the same lemma as if they were independent of one another...
متن کاملPhrase Linguistic Classification and Generalization for Improving Statistical Machine Translation
In this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed, as a first step towards an SMT model based on linguistically-classified phrases. By substituting these verb structures by the base form of the head verb, we achieve a better statistical word alignment performance, and are able to better estimate the translation model and generalize...
متن کاملReduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdep...
متن کاملPhrase Pair Mappings for Hindi-English Statistical Machine Translation
In this paper, we present our work on the creation of lexical resources for the Machine Translation between English and Hindi. We describes the development of phrase pair mappings for our experiments and the comparative performance evaluation between different trained models on top of the baseline Statistical Machine Translation system. We focused on augmenting the parallel corpus with more voc...
متن کامل